142 research outputs found
Domain Adaptation on Graphs by Learning Graph Topologies: Theoretical Analysis and an Algorithm
Traditional machine learning algorithms assume that the training and test
data have the same distribution, while this assumption does not necessarily
hold in real applications. Domain adaptation methods take into account the
deviations in the data distribution. In this work, we study the problem of
domain adaptation on graphs. We consider a source graph and a target graph
constructed with samples drawn from data manifolds. We study the problem of
estimating the unknown class labels on the target graph using the label
information on the source graph and the similarity between the two graphs. We
particularly focus on a setting where the target label function is learnt such
that its spectrum is similar to that of the source label function. We first
propose a theoretical analysis of domain adaptation on graphs and present
performance bounds that characterize the target classification error in terms
of the properties of the graphs and the data manifolds. We show that the
classification performance improves as the topologies of the graphs get more
balanced, i.e., as the numbers of neighbors of different graph nodes become
more proportionate, and weak edges with small weights are avoided. Our results
also suggest that graph edges between too distant data samples should be
avoided for good generalization performance. We then propose a graph domain
adaptation algorithm inspired by our theoretical findings, which estimates the
label functions while learning the source and target graph topologies at the
same time. The joint graph learning and label estimation problem is formulated
through an objective function relying on our performance bounds, which is
minimized with an alternating optimization scheme. Experiments on synthetic and
real data sets suggest that the proposed method outperforms baseline
approaches
A study of the classification of low-dimensional data with supervised manifold learning
Supervised manifold learning methods learn data representations by preserving
the geometric structure of data while enhancing the separation between data
samples from different classes. In this work, we propose a theoretical study of
supervised manifold learning for classification. We consider nonlinear
dimensionality reduction algorithms that yield linearly separable embeddings of
training data and present generalization bounds for this type of algorithms. A
necessary condition for satisfactory generalization performance is that the
embedding allow the construction of a sufficiently regular interpolation
function in relation with the separation margin of the embedding. We show that
for supervised embeddings satisfying this condition, the classification error
decays at an exponential rate with the number of training samples. Finally, we
examine the separability of supervised nonlinear embeddings that aim to
preserve the low-dimensional geometric structure of data based on graph
representations. The proposed analysis is supported by experiments on several
real data sets
Out-of-sample generalizations for supervised manifold learning for classification
Supervised manifold learning methods for data classification map data samples
residing in a high-dimensional ambient space to a lower-dimensional domain in a
structure-preserving way, while enhancing the separation between different
classes in the learned embedding. Most nonlinear supervised manifold learning
methods compute the embedding of the manifolds only at the initially available
training points, while the generalization of the embedding to novel points,
known as the out-of-sample extension problem in manifold learning, becomes
especially important in classification applications. In this work, we propose a
semi-supervised method for building an interpolation function that provides an
out-of-sample extension for general supervised manifold learning algorithms
studied in the context of classification. The proposed algorithm computes a
radial basis function (RBF) interpolator that minimizes an objective function
consisting of the total embedding error of unlabeled test samples, defined as
their distance to the embeddings of the manifolds of their own class, as well
as a regularization term that controls the smoothness of the interpolation
function in a direction-dependent way. The class labels of test data and the
interpolation function parameters are estimated jointly with a progressive
procedure. Experimental results on face and object images demonstrate the
potential of the proposed out-of-sample extension algorithm for the
classification of manifold-modeled data sets
Nonlinear Supervised Dimensionality Reduction via Smooth Regular Embeddings
The recovery of the intrinsic geometric structures of data collections is an
important problem in data analysis. Supervised extensions of several manifold
learning approaches have been proposed in the recent years. Meanwhile, existing
methods primarily focus on the embedding of the training data, and the
generalization of the embedding to initially unseen test data is rather
ignored. In this work, we build on recent theoretical results on the
generalization performance of supervised manifold learning algorithms.
Motivated by these performance bounds, we propose a supervised manifold
learning method that computes a nonlinear embedding while constructing a smooth
and regular interpolation function that extends the embedding to the whole data
space in order to achieve satisfactory generalization. The embedding and the
interpolator are jointly learnt such that the Lipschitz regularity of the
interpolator is imposed while ensuring the separation between different
classes. Experimental results on several image data sets show that the proposed
method outperforms traditional classifiers and the supervised dimensionality
reduction algorithms in comparison in terms of classification accuracy in most
settings
Tangent space estimation for smooth embeddings of Riemannian manifolds
Numerous dimensionality reduction problems in data analysis involve the
recovery of low-dimensional models or the learning of manifolds underlying sets
of data. Many manifold learning methods require the estimation of the tangent
space of the manifold at a point from locally available data samples. Local
sampling conditions such as (i) the size of the neighborhood (sampling width)
and (ii) the number of samples in the neighborhood (sampling density) affect
the performance of learning algorithms. In this work, we propose a theoretical
analysis of local sampling conditions for the estimation of the tangent space
at a point P lying on a m-dimensional Riemannian manifold S in R^n. Assuming a
smooth embedding of S in R^n, we estimate the tangent space T_P S by performing
a Principal Component Analysis (PCA) on points sampled from the neighborhood of
P on S. Our analysis explicitly takes into account the second order properties
of the manifold at P, namely the principal curvatures as well as the higher
order terms. We consider a random sampling framework and leverage recent
results from random matrix theory to derive conditions on the sampling width
and the local sampling density for an accurate estimation of tangent subspaces.
We measure the estimation accuracy by the angle between the estimated tangent
space and the true tangent space T_P S and we give conditions for this angle to
be bounded with high probability. In particular, we observe that the local
sampling conditions are highly dependent on the correlation between the
components in the second-order local approximation of the manifold. We finally
provide numerical simulations to validate our theoretical findings
Geometry-Aware Neighborhood Search for Learning Local Models for Image Reconstruction
Local learning of sparse image models has proven to be very effective to
solve inverse problems in many computer vision applications. To learn such
models, the data samples are often clustered using the K-means algorithm with
the Euclidean distance as a dissimilarity metric. However, the Euclidean
distance may not always be a good dissimilarity measure for comparing data
samples lying on a manifold. In this paper, we propose two algorithms for
determining a local subset of training samples from which a good local model
can be computed for reconstructing a given input test sample, where we take
into account the underlying geometry of the data. The first algorithm, called
Adaptive Geometry-driven Nearest Neighbor search (AGNN), is an adaptive scheme
which can be seen as an out-of-sample extension of the replicator graph
clustering method for local model learning. The second method, called
Geometry-driven Overlapping Clusters (GOC), is a less complex nonadaptive
alternative for training subset selection. The proposed AGNN and GOC methods
are evaluated in image super-resolution, deblurring and denoising applications
and shown to outperform spectral clustering, soft clustering, and geodesic
distance based subset selection in most settings.Comment: 15 pages, 10 figures and 5 table
Locally Stationary Graph Processes
Stationary graph process models are commonly used in the analysis and
inference of data sets collected on irregular network topologies. While most of
the existing methods represent graph signals with a single stationary process
model that is globally valid on the entire graph, in many practical problems,
the characteristics of the process may be subject to local variations in
different regions of the graph. In this work, we propose a locally stationary
graph process (LSGP) model that aims to extend the classical concept of local
stationarity to irregular graph domains. We characterize local stationarity by
expressing the overall process as the combination of a set of component
processes such that the extent to which the process adheres to each component
varies smoothly over the graph. We propose an algorithm for computing LSGP
models from realizations of the process, and also study the approximation of
LSGPs locally with WSS processes. Experiments on signal interpolation problems
show that the proposed process model provides accurate signal representations
competitive with the state of the art
Discretization of Parametrizable Signal Manifolds
Transformation-invariant analysis of signals often requires the computation
of the distance from a test pattern to a transformation manifold. In
particular, the estimation of the distances between a transformed query signal
and several transformation manifolds representing different classes provides
essential information for the classification of the signal. In many
applications the computation of the exact distance to the manifold is costly,
whereas an efficient practical solution is the approximation of the manifold
distance with the aid of a manifold grid. In this paper, we consider a setting
with transformation manifolds of known parameterization. We first present an
algorithm for the selection of samples from a single manifold that permits to
minimize the average error in the manifold distance estimation. Then we propose
a method for the joint discretization of multiple manifolds that represent
different signal classes, where we optimize the transformation-invariant
classification accuracy yielded by the discrete manifold representation.
Experimental results show that sampling each manifold individually by
minimizing the manifold distance estimation error outperforms baseline sampling
solutions with respect to registration and classification accuracy. Performing
an additional joint optimization on all samples improves the classification
performance further. Moreover, given a fixed total number of samples to be
selected from all manifolds, an asymmetric distribution of samples to different
manifolds depending on their geometric structures may also increase the
classification accuracy in comparison with the equal distribution of samples
Learning Multi-Modal Nonlinear Embeddings: Performance Bounds and an Algorithm
While many approaches exist in the literature to learn low-dimensional
representations for data collections in multiple modalities, the
generalizability of multi-modal nonlinear embeddings to previously unseen data
is a rather overlooked subject. In this work, we first present a theoretical
analysis of learning multi-modal nonlinear embeddings in a supervised setting.
Our performance bounds indicate that for successful generalization in
multi-modal classification and retrieval problems, the regularity of the
interpolation functions extending the embedding to the whole data space is as
important as the between-class separation and cross-modal alignment criteria.
We then propose a multi-modal nonlinear representation learning algorithm that
is motivated by these theoretical findings, where the embeddings of the
training samples are optimized jointly with the Lipschitz regularity of the
interpolators. Experimental comparison to recent multi-modal and single-modal
learning algorithms suggests that the proposed method yields promising
performance in multi-modal image classification and cross-modal image-text
retrieval applications
- …